ESearch: Incorporating Text Corpus and Structured Knowledge for Open Domain Entity Search
نویسندگان
چکیده
The paper introduces an open domain entity search system called ESearch, which aims at finding a list of relevant entities to an open domain entity search query (a natural language question). The system is built on top of a Wikipedia text corpus, as well as the structured DBPedia knowledge base. Entities are initially ranked by a model which effectively associates context matching (based on the contexts of entities in the unstructured text corpus) and category matching (based on the types of entities in the structured knowledge base). They are ranked further by a re-ranking component supported by blind feedback or user feedback on entities. We show that category matching is critical for the search performance and the re-ranking component can boost the performance largely. Category matching therefore needs some query entity types (especially specific entity types) as input. However, it is often hard for systems to detect specific entity types because users may not be familiar with how the types of desired entities are defined in the structured knowledge base. In ESearch, we design an effective ranking model of entity types to facilitate blind feedback and user feedback on desired entity types for category matching, so that users can effectively perform entity search without the need of explicitly providing any query entity types as inputs.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملQuery Architecture Expansion in Web Using Fuzzy Multi Domain Ontology
Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...
متن کاملLife-iNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences
Search engines running on scientific literature have been widely used by life scientists to find publications related to their research. However, existing search engines in the life-science domain, such as PubMed, have limitations when applied to exploring and analyzing factual knowledge (e.g., disease-gene associations) in massive text corpora. These limitations are mainly due to the problems ...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کامل